When using metacharacters like \w
without the Unicode flag or when using hard-coded character classes like [a-zA-Z]
,
letters outside of the ASCII range, such as umlauts, accented letters, or letters from non-Latin languages, won’t be matched. This may cause code to
incorrectly handle input containing such letters.
To correctly handle non-ASCII input, it is recommended to use the Unicode flag \u
or Unicode character properties like
\p{L}
.
Noncompliant code example
preg_match("/[a-zA-Z]/", "ö"); // returns 0
preg_match("/\w/", "ö"); // returns 0
Compliant solution
preg_match("/\w/u", "ö"); // returns 1
preg_match("/\p{L}/", "ö"); // return 1